Cross-Entropy and Linguistic Typology

نویسنده

  • Patrick Juola
چکیده

Patrick Juola Department of Experimental Psychology University of Oxford Oxford, UK OX1 3UD [email protected] Abstract The idea of \familial relationships" among languages is well-established and accepted, although some controversies persist in a few speci c instances. By painstakingly recording and identifying regularities and similarities and comparing these to the historical record, linguists have been able to produce a general \family tree" incorporating most natural languages. We suggest here that much of these trees can be automatically determined by a complementary technique of distributional analysis. Recent work by (Farach et al., 1995) and (Juola, 1997) suggests that Kullback-Leibler divergence (or cross-entropy) can be meaningfully measured from small samples, in some cases as small as only 20 or so words. Using these techniques, we de ne and measure a distance function between translations of a small corpus (c. 70 words/sample) covering much of the accepted Indo-European family, and reconstruct a relationship tree by hierarchical cluster analysis. The resulting tree shows remarkable similarity to the accepted Indo-European family; this we read as evidence both for the immense power of this measurement technique and for the validity of this kind of mechanical similarity judgement in the identi cation of typological relationships. Furthermore, this technique is in theory sensitive to di erent sorts of relationships than more common word-list based methods and may help illuminate these from a di erent direction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What is Phonological Typology?

In this talk I am concerned with the following questions: 1. What is phonological typology? 2. How are phonological typology and phonetic typology the same/different? 3. How are phonological typology and general phonology the same/different? 4. How are phonological typology and general typology the same/different? Despite earlier work by Trubetzkoy, Jakobson, Martinet, Greenberg and others, and...

متن کامل

Does MaxEnt Overgenerate? Implicational Universals in Maximum Entropy Grammar

It goes without saying that a good linguistic theory should neither undergenerate (i.e., it should not miss any attested patterns) nor overgenerate (i.e., it should not predict any “unattestable” patterns). Recent literature has argued that the Maximum Entropy (ME; Goldwater & Johnson 2003) framework provides a probabilistic extension of categorical Harmonic Grammar (HG; Legendre et al. 1990; S...

متن کامل

How typology allows for a new analysis of verb phrase in Burmese

Description, classification, diversity and universality of languages are the key terms of linguistic typology, which first aims at answering the following question: what do languages have in common and in what ways do they differ?1 In other words, typology is concerned with finding properties that are shared by languages (invariants in Lazard terms), and has to do with cross-linguistic comparis...

متن کامل

Jae Jung Song, Linguistic typology: morphology and syntax (Longman

The back cover of this book characterizes it as ‘an up-to-date critical introduction to linguistic typology’. The title might suggest that it gives equal coverage to morphology and syntax, but it is mainly about syntax. Chapter 1, ‘Introducing linguistic typology’, is a useful introduction to typology. For someone new to the area, it lays out the object of linguistic typology and discusses some...

متن کامل

Morphological typology of languages for IR

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of each language of the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is neede...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998